Rational and Convergent Learning in Stochastic Games

نویسندگان

  • Michael H. Bowling
  • Manuela M. Veloso
چکیده

This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MULTI-AGENT SYSTEMS MULTI-AGENT SYSTEMS MULTI-AGENT GAMES Rational and Convergent Learning in Stochastic Games

This paper investigates the problem of policy learn-ing in multiagent environments using the stochasticgame framework, which we briefly overview. Weintroduce two properties as desirable for a learningagent when in the presence of other learning agents,namely rationality and convergence. We examineexisting reinforcement learning algorithms accord-ing to these two prop...

متن کامل

Balancing Two-Player Stochastic Games with Soft Q-Learning

Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning...

متن کامل

Rational Learning in Imperfect Monitoring Games

This paper provides a general framework to analyze rational learning in strategic situations where the players have private priors, private information and there is a role for passive and active learning. The theory of statistical inference for stochastic processes and of Markovian dynamic programming is applied to study players asymptotic behavior in the context of repeated and of recurring ga...

متن کامل

Coco-Q: Learning in Stochastic Games with Side Payments

Coco (“cooperative/competitive”) values are a solution concept for two-player normalform games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing ...

متن کامل

Online Learning in Stochastic Games and Markov Decision Processes

In their standard formulations, stochastic games and Markov decision processes assume a rational opponent or a stationary environment. Online learning algorithms can adapt to arbitrary opponents and non-stationary environments, but do not incorporate the dynamic structure of stochastic games or Markov decision processes. We survey recent approaches that apply online learning to dynamic environm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001